Training Software Engineering Agents and Verifiers with SWE-Gym

We present SWE-Gym, the first environment for training real-world software engineering (SWE) agents.

SWE-Gym contains 2,438 real-world Python task instances, each comprising a codebase with an executable runtime environment, unit tests, and a task specified in natural language.

We use SWE-Gym to train language model based SWE agents , achieving up to 19% absolute gains in resolve rate on the popular SWE-Bench Verified and Lite test sets.

Figure 2: リポジトリの内訳

リポジトリごとに単体テストを動かすのが難しい

単体テストの結果を予測するreward model -> inference time scaling (Figure 1 bottom)